Skip to main content

Internet Archive Web Crawls

The Internet Archive discovers and captures web pages through many different web crawls.



rss RSS

1,672,675
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
Prior Page
eye
Title
Date Archived
Creator
Wide Crawl started September 2010
Wide Crawl started September 2010
collection
332
ITEMS
16.1M
VIEWS
collection

eye 16.1M

Web wide crawl with initial seedlist and crawler configuration from September 2010
Live Web Proxy Crawls
web

eye 8.3M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 1.1M

favorite 0

comment 0

YouTube 2007 Crawl
YouTube 2007 Crawl
collection
73,847
ITEMS
14.9M
VIEWS
collection

eye 14.9M

Data crawled from YouTube.com in 2007 by Internet Archive. These files are not currently accessible.
Live Web Proxy Crawls
web

eye 20M

favorite 7

comment 1

Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Sun Mar 27 22:10:09 PDT 2011 to Mon Mar 28 05:27:05 PDT 2011.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
World Wars Crawl
World Wars Crawl
collection
13
ITEMS
14.5M
VIEWS
collection

eye 14.5M

Web data related to World Wars I and II collected by Internet Archive in an experimental crawl sponsored by National Endowment for the Humanities and JISC. This data is currently not publicly accessible.
Shallow Crawl Started February 2013
Shallow Crawl Started February 2013
collection
86
ITEMS
17.8M
VIEWS
collection

eye 17.8M

Shallow crawl started February 2013 that collects content 1 level deep, including embeds. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
2004 Election
2004 Election
collection
178
ITEMS
14M
VIEWS
collection

eye 14M

2004 Election crawl performed by Internet Archive. This data is currently not publicly accessible.
Live Web Proxy Crawls
web

eye 11.5M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 3.8M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 1.9M

favorite 0

comment 0

Wide Crawl Number 17: Started August 3rd, 2018
web

eye 5.5M

favorite 1

comment 3

Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Mon May 13 17:55:38 PDT 2019 to Mon May 13 13:45:55 PDT 2019.
favoritefavoritefavoritefavoritefavorite ( 3 reviews )
Topic: crawldata
Shallow Crawl Started November 2012
Shallow Crawl Started November 2012
collection
137
ITEMS
11.2M
VIEWS
collection

eye 11.2M

Shallow crawl started November 2012 that collects content 1 level deep, including embeds. This data is currently not publicly accessible.
Hurricane Katrina
Hurricane Katrina
collection
112
ITEMS
10.7M
VIEWS
collection

eye 10.7M

Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
Live Web Proxy Crawls
web

eye 9.9M

favorite 0

comment 0

End of Term 2016 Post-Inauguration Crawls
End of Term 2016 Post-Inauguration Crawls
collection
5,308
ITEMS
8.3M
VIEWS
collection

eye 8.3M

This collection contains web crawls performed as the post-inauguration crawl for part of the End of Term Web Archive, a collaborative project that aims to preserve the U.S. federal government web presence at each change of administration. Content includes publicly-accessible government websites hosted on .gov, .mil, and relevant non-.gov domains, as well as government social media materials. The web archiving was performed in the Winter of 2016  and Spring of 2017 to capture websites...
Topics: end of term, federal government, 2016, president, congress
Survey Crawl Number 8
web

eye 1.6M

favorite 0

comment 0

Internet Archive crawldata from Survey Webwide Crawl, captured by crawl836.us.archive.org:survey from Fri Feb 22 02:17:40 PST 2019 to Thu Feb 21 23:46:50 PST 2019.
Topic: crawldata
Live Web Proxy Crawls
web

eye 2.6M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 3M

favorite 0

comment 0

End of Term 2008 UNT Dot Gov Crawl
End of Term 2008 UNT Dot Gov Crawl
collection
282
ITEMS
10.8M
VIEWS
collection

eye 10.8M

End of term 2008 crawl of .gov domains gathered by University of North Texas . This data is currently not publicly accessible. UNT is a student-focused, public, research university located in Denton, Texas.
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl344.us.archive.org:survey from Sat Sep 30 10:02:40 PDT 2017 to Sat Sep 30 03:34:44 PDT 2017.
Topic: crawldata
Survey Crawl Number 8
web

eye 939,890

favorite 0

comment 0

Internet Archive crawldata from Survey Webwide Crawl, captured by crawl339.us.archive.org:survey from Sat Jan 12 18:55:42 PST 2019 to Sat Jan 12 19:13:13 PST 2019.
Topic: crawldata
Survey Crawl Number 8
web

eye 413,977

favorite 0

comment 0

Internet Archive crawldata from Survey Webwide Crawl, captured by crawl836.us.archive.org:survey from Mon Dec 10 09:04:48 PST 2018 to Mon Dec 10 06:28:38 PST 2018.
Topic: crawldata
Data crawled by National Endowment for the Humanities and JISC on behalf of Internet Archive from Fri Aug 08 00:17:40 PDT 2008 to Thu Jun 26 05:29:33 PDT 2008
Topic: crawldata
End Of Term 2016 Library of Congress Crawls
End Of Term 2016 Library of Congress Crawls
collection
3,892
ITEMS
6.4M
VIEWS
collection

eye 6.4M

End of Term 2016 Web Archive government web crawls by project partner the Library of Congress.
Topics: end of term, federal government, 2016, president, congress, library of congress, web, data
Wayback Robots Crawl
Wayback Robots Crawl
collection
129
ITEMS
7.8M
VIEWS
collection

eye 7.8M

Wayback robots.txt crawl performed by Internet Archive. This data is currently not publicly accessible.
Wide Crawl started February 2014
web

eye 8.4M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web

eye 8.2M

favorite 0

comment 1

Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Wide Crawl started February 2014
web

eye 8.3M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Tue Feb 18 17:01:58 PST 2014 to Tue Feb 18 13:14:06 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web

eye 8.3M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Wed Feb 19 09:49:01 PST 2014 to Wed Feb 19 06:07:15 PST 2014.
Topic: crawldata
Live Web Proxy Crawls
web

eye 8.3M

favorite 0

comment 0

Wide Crawl started February 2014
web

eye 8.3M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl454.us.archive.org:wide from Wed Feb 19 05:20:19 PST 2014 to Wed Feb 19 01:54:33 PST 2014.
Topic: crawldata
Source: ximm-collections-news-crawls-v3
Live Web Proxy Crawls
web

eye 412,496

favorite 0

comment 0

Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-liveweb.us.archive.org from 2012-06-04T16:05:30 UTC to 2012-06-05T00:00:01 UTC.
Topic: crawldata
Wide Crawl Number 17: Started August 3rd, 2018
web

eye 951,506

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl806.us.archive.org:wide from Tue Oct 9 21:09:32 PDT 2018 to Tue Oct 9 18:23:47 PDT 2018.
Topic: crawldata
Live Web Proxy Crawls
web

eye 75,632

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 1.1M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 351,456

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 7.2M

favorite 0

comment 0

Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Mon Mar 28 12:43:47 PDT 2011 to Mon Mar 28 16:56:17 PDT 2011.
Topic: crawldata
UNT Web
UNT Web
collection
35
ITEMS
6.6M
VIEWS
collection

eye 6.6M

This collection contains all collaborative crawl data contributed by University of North Texas (UNT).
Topics: UNT, web, texas, eot
2004 Indian Ocean earthquake and tsunami
2004 Indian Ocean earthquake and tsunami
collection
42
ITEMS
6.8M
VIEWS
collection

eye 6.8M

Data related to the 2004 Indian Ocean earthquake and tsunami collected by Internet Archive. This data is currently not publicly accessible.
Top 150 Crawl
Top 150 Crawl
collection
30
ITEMS
6.7M
VIEWS
collection

eye 6.7M

Top 150 Alexa sites crawl performed by Internet Archive. This data is currently not publicly accessible.
Live Web Proxy Crawls
web

eye 5.4M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 361,616

favorite 0

comment 0

Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2012-11-26T02:49:24 UTC to 2012-11-26T11:32:59 UTC.
Topic: crawldata
Live Web Proxy Crawls
web

eye 2.5M

favorite 0

comment 0

Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live3.us.archive.org from 2013-07-14T21:39:49 UTC to 2013-07-15T19:15:09 UTC.
Topic: crawldata
Live Web Proxy Crawls
web

eye 387,745

favorite 0

comment 0

Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2012-11-26T10:28:21 UTC to 2012-11-26T21:41:23 UTC.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Fri Nov 01 06:23:33 PDT 2002 to Tue Nov 19 23:24:07 PDT 2002
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Live Web Proxy Crawls
web

eye 1.5M

favorite 0

comment 0

UK Government Site Crawl
UK Government Site Crawl
collection
107
ITEMS
6.1M
VIEWS
collection

eye 6.1M

Collaborative closure crawl of British government sites performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : GOV.UK is a United Kingdom public sector information website, created by the Government Digital Service to provide a single point of access to HM Government services.
Live Web Proxy Crawls
web

eye 3.1M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 942,398

favorite 0

comment 0

Data crawled by Internet Archive on behalf of Internet Archive from Mon Oct 26 17:21:45 PDT 2009 to Mon Oct 26 17:58:41 PDT 2009
Topic: crawldata
Live Web Proxy Crawls
web

eye 1.3M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Jun 29 15:44:55 PDT 2016 to Wed Jun 29 10:43:30 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl813.us.archive.org:wide from Wed Jun 29 17:31:53 PDT 2016 to Wed Jun 29 15:11:43 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl835.us.archive.org:wide from Thu Jun 30 10:52:08 PDT 2016 to Thu Jun 30 06:07:26 PDT 2016.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Sep 18 12:46:38 PDT 2004 to Thu May 05 09:34:36 PDT 2005
Topic: crawldata
Live Web Proxy Crawls
web

eye 858,679

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 6.5M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 773,775

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Apr 7 15:00:19 PDT 2016 to Thu Apr 7 09:30:57 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl825.us.archive.org:wide from Thu Apr 7 11:34:12 PDT 2016 to Thu Apr 7 06:58:33 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Apr 7 12:21:53 PDT 2016 to Thu Apr 7 07:27:22 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Thu Apr 7 17:22:05 PDT 2016 to Thu Apr 7 13:08:49 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl803.us.archive.org:wide from Thu Apr 7 12:24:26 PDT 2016 to Thu Apr 7 08:51:27 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl812.us.archive.org:wide from Thu Apr 7 12:21:12 PDT 2016 to Thu Apr 7 07:19:38 PDT 2016.
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Thu Apr 7 12:12:12 PDT 2016 to Thu Apr 7 07:26:33 PDT 2016.
Topic: crawldata
Live Web Proxy Crawls
web

eye 5.8M

favorite 0

comment 0

Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Tue Mar 29 00:12:24 PDT 2011 to Tue Mar 29 07:24:41 PDT 2011.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Fri Sep 26 12:13:03 PDT 2003 to Wed Dec 10 23:25:51 PDT 2003
Topic: crawldata
Live Web Proxy Crawls
web

eye 3.1M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 1.8M

favorite 0

comment 0

Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live4.us.archive.org from 2013-08-11T22:27:04 UTC to 2013-08-13T04:44:55 UTC.
Topic: crawldata
Live Web Proxy Crawls
web

eye 83,534

favorite 0

comment 0

Host Screen Captures
web

eye 1.6M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl431.us.archive.org:widewebcap from Tue Jan 17 22:09:27 UTC 2012 to Wed Feb 1 11:19:58 UTC 2012.
Topic: crawldata
Live Web Proxy Crawls
web

eye 457,766

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 189,834

favorite 0

comment 0